pacman::p_load(knitr,
leaflet,
lubridate,
raster,
sf,
spatstat,
spNetwork,
tidyverse,
tmap)Take-home Ex01

1.0 Introduction & Objectives
In this Take-Home Exercise 1, we will be discovering factors affecting road traffic accidents in the Bangkok Metropolitan Region BMR by employing both spatial and spatio-temporal point patterns analysis methods.
The specific objectives of this take-home exercise are as follows:
To visualize the spatio-temporal dynamics of road traffic accidents in BMR using appropriate statistical graphics and geovisualization methods.
To conduct detailed spatial analysis of road traffic accidents using appropriate Network Spatial Point Patterns Analysis methods.
To conduct detailed spatio-temporal analysis of road traffic accidents using appropriate Temporal Network Spatial Point Patterns Analysis methods.
1.1 The Study Area
Thailand has four administrative levels:
Level 0 - Country - Whole of Thailand
Level 1 - Provinces (Changwat) - Total: 76 provinces, with Bangkok as special administrative area.
Level 2 - District (Amphoe)
Level 3 - Sub-district (Tambon)
Level 4 - Village (Muban)
In this exercise we will focus our analysis on the study area called the Bangkok Metropolitan Region (BMR), which consists of Bangkok and its five adjacent provinces, namely Nakhon Pathom, Pathum Thani, Nonthaburi, Samut Prakan, and Samut Sakhon.
To confirm the EPSG code for the study area, we can check it on epsg.io. Enter “Thailand” and we will yield several results under Projected CRS:
Indian 1954 / UTM zone 46N - EPSG 23946 with transformation: 1153
Indian 1954 / UTM zone 47N - EPSG 23947 with transformation 1153
Indian 1954 / UTM zone 48N - EPSG 23948 with transformation 1153
Indian 1975 / UTM zone 47N - EPSG 24047 with transformation 1812
Indian 1975 / UTM zone 48N - EPSG 24048 with transformation 1812
WGS 84 / UTM zone 47N - EPSG 32647
WGS 84 / UTM zone 48N - EPSG 32648
WGS 84 / PDC Mercator - EPSG 3832
We will be using Geodetic CRS WGS 84 as it is a global standard. It leaves us with the last three options. As will click into each result, we will observe that WGS 84 / PDC Mercator covers wide area of use, and perhaps, more suitable for seafare. Whereas for WGS 84 / UTM zone 47N and WGS 84 / UTM zone 48N, the area are more precise, defined by the Easting and Northing.
Considering Bangkok’s Coordinate (13.7563° N, 100.5018° E), we will use WGS 84 / UTM zone 47N - EPSG 32647, as its Northing of 13.7563° falls between the Equator and 84°N, and its Easting of 100.5018° falls between 96°E and 102°E, which are both in the area of use as indicated in the table above.
1.2 The Datasets
The following datasets are provided as part of the Take Home Exercise 1:
Aspatial
Thailand Road Accident [2019-2022] on Kaggle
- This dataset is a list of Thailand Road Accident between 2019 to 2022 in .csv format, where details like province, date/time of accident, cause of accident, type of vehicles and weather conditions are provided. The variables in this dataset will be used for analysis of the factors that contributes to the road traffic accidents.
Geospatial
Thailand Roads (OpenStreetMap Export) on HDX.
- This dataset contains all the Thailand Road network which we will use to extract relevant roads within the study area.
Thailand - Subnational Administrative Boundaries on HDX.
- This dataset will be used to set the boundaries for the study area, so as to exclude the data points that are irrelevant for this study.
2.0 Setting up the Environment
2.1 Installing and loading the required libraries
The code chunk below checks if the packages are installed. If the packages are not yet installed, it will proceed to install and subsequently load the libraries. If the packages are already installed, it will proceed to launch into the R environment.
| Packages | Description |
|---|---|
| knitr | For dynamic report generation |
| leaflet | For interactive map |
| lubridate | Functions to work with date-times and time-spans |
| raster | Reading, writing, manipulating, analyzing and modeling of spatial data. |
| sf | For importing, managing, ad handling geospatial data |
| spatstat | For Spatial Point Pattern Analysis (SPPA) |
| spNetwork | Perform spatial analysis on network |
| tidyverse | For aspatial data wrangling |
| tmap | For thematic mapping |
2.2 Setting Seeds to Ensure Reproducibility
The set. seed() function is used to set a Random seed which Pseudorandom number generators use when generating “random” numbers. By using this function, we ensure that the randomly generated numbers remain the same when the code are reproduced.
set.seed(12345)3.0 Importing and Wrangling the Data
3.1 Importing Aspatial Data and Converting it into Spatial Data
Importing the data without filtering:
acc <- read_csv("data/rawdata/thai_road_accident_2019_2022.csv")Importing the data, and conducting necessary filter, conversion to sf, transformation of CRS, and get the days component of the incident_datetime:
acc_sf <- read_csv("data/rawdata/thai_road_accident_2019_2022.csv") %>%
filter(!is.na(longitude) &longitude != "",!is.na(latitude) & latitude != "") %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(crs = 32647) %>%
mutate(Day = day(incident_datetime)) %>%
mutate(Month = month(incident_datetime,
label = TRUE,
abbr = TRUE)) %>%
mutate(Year = year(incident_datetime)) %>%
mutate(DaysOfWeek = wday(incident_datetime,
week_start = 1)) %>%
dplyr::select(c(2,5,8:21))- read_csv() of readr package to import the data in .csv format as tibble dataframe.
Before proceeding to filter the data, we observe that there is a total of 81,735 observations.
- 
dplyr::filter() to filter out rows that has “na” or is empty in value.
After filtering “na” and empty values, we are left with 81,376 observations. This means that we have lost about 0.44% of the data. This will not affect our analysis as 0.44% a small proportion of the total number of observations. (Rule of Thumb, not >5% lost)

st_as_sf(coords = c(“longitude”, “latitude”), crs = 4326) combines and longitude and latitude columns into geometry column.
before applying st_as_sf() function, we observe that the latitude and longitude are in decimal degrees, therefore, we assume it is WGS84 datum, with the EPSG code of 4326.

Notice that the number of variables changed from 18 to 17. This is because st_as_sf() function has combined the longitude and latitude columns in the original dataset into one column name geometry. The columns named longitude and latitude are no longer found in the data.

st_transform() to change the Coordinate Reference System (CRS) to the correct EPSG code of 32647.
lubridate() is used to wrangle the incident_datetime, which is in datetime format of POSIXct.
lubridate::month(): label = TRUE -> change it into factor. If we do not use label = TRUE, it will be sorted using alphabetical logic. If it is a factor, it will be sorted according to date/month logic from Jan to Dec.
The columns of Day, Month, Year, and DaysOfWeek are created.

Use dplyr::select() to select the relevant columns to retain.
The output R object is called acc_sf and it is a sf data frame.
After the wrangling the data, we will save them for future use, using the write_rds() of readr package. The step will help us save time of re-running the codes for importing and wrangling the raw data.
write_rds(acc_sf, "data/rds/acc_sf.rds")We can import the data using the read_rds() of the readr package:
acc_sf <- read_rds("data/rds/acc_sf.rds")3.2 Importing Spatial Data
3.2.1 Importing the Administrative Boundaries
Importing the Thailand - Subnational Administrative Boundaries using st_read() of sf package. Since we are only interested in the Bangkok Metropolitan Region (BMR) as our study area, we will extract only the six provinces by using filter(). We will use the layer name “tha_admbnda_adm1_rtsd_20220121” as ADM1 refers to the province level boundaries. Note: Thailand administrative level 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries.
sab_P <- st_read(dsn = "data/rawdata",
layer = "tha_admbnda_adm1_rtsd_20220121") %>%
filter(ADM1_EN %in% c("Bangkok","Nakhon Pathom", "Pathum Thani", "Nonthaburi", "Samut Prakan", "Samut Sakhon")) %>%
dplyr::select(3,17) %>%
st_transform(crs = 32647)
write_rds(sab_P, "data/rds/sab_P.rds")We will use the following code chunk to take a quick look at selected BMR boundaries.
sab_P <- read_rds("data/rds/sab_P.rds")
plot(st_geometry(sab_P))
Since we are not very familiar with the Thailand map, we may want to double check the correctness by including the label with name of the six province and the coordinates.
# setting the colors for each Province
colors = c("Bangkok" = "#FF0000",
"Nakhon Pathom" = "#55FF00",
"Pathum Thani" = "#FFAA00",
"Nonthaburi" = "#0000FF",
"Samut Prakan" = "#7F00FF",
"Samut Sakhon" = "#00FFFF")
ggplot(data = sab_P) +
geom_sf(aes(fill = ADM1_EN)) +
scale_fill_manual(values = colors,
name = "Province") +
geom_text(aes(label = ADM1_EN,
geometry = geometry),
stat = "sf_coordinates",
size = 3)
If we wish to go to the district-level granularity, we may choose to import the ADM2 layer instead.
sab_C <- st_read(dsn = "data/rawdata",
layer = "tha_admbnda_adm2_rtsd_20220121") %>%
filter(ADM1_EN %in% c("Bangkok","Nakhon Pathom", "Pathum Thani", "Nonthaburi", "Samut Prakan", "Samut Sakhon")) %>%
dplyr::select(3,11,20) %>%
st_transform(crs = 32647)
write_rds(sab_C, "data/rds/sab_C.rds")Similarly, we will use ggplot2() to visualise the using the Level 2 - District (Amphoe) boundaries.
sab_C <- read_rds("data/rds/sab_C.rds")
ggplot(data = sab_C) +
geom_sf(aes(fill = ADM1_EN)) +
scale_fill_manual(values = colors,
name = "Province")
For creating the boundaries for the study area, ADM1_EN would be sufficient.
However, we may want to have ADM2_EN for detailed study for each districts within the BMR.
3.2.2 Creating owin object for BMR boundaries
bmr_owin <- as.owin(sab_P)
write_rds(bmr_owin, "data/rds/bmr_owin.rds")bmr_owin <- read_rds("data/rds/bmr_owin.rds")
plot(bmr_owin)
summary(bmr_owin)Window: polygonal boundary
single connected closed polygon with 13779 vertices
enclosing rectangle: [587893.5, 712440.5] x [1484413.7, 1579076.3] units
(124500 x 94660 units)
Window area = 7668990000 square units
Fraction of frame area: 0.65
Window Area / Size of the Area
Window Area = 7,668,990,000 square units (m2)
converting to km2 by dividing it by 1,000,000.
Window Area of BMR = ~7,668.99 km2
3.2.3 Importing the Road Networks
In this code chunk, we will import the Thailand Roads (OpenStreetMap Export) downloaded from HDX.
road <- st_read(dsn ="data/rawdata",
layer = "hotosm_tha_roads_lines_shp") Assigning EPSG code 4326 as the geometry shows longlat decimal degree between (-180 to 180). Next, we use st_transform() to re-project from Geographic CRS of WGS84 to Projected CRS of UTM Zone 47N. After exploring the data, we can see that most of the accidents occur on the four types of highway: “motorway”, “trunk”, “primary”,“secondary”.
roads_clean <- road %>%
filter(highway %in% c("motorway", "trunk", "primary","secondary")) %>%
dplyr::select(2:4,15) %>%
st_set_crs(4326) %>%
st_transform(crs = 32647)Saving road_clean as .rds:
write_rds(roads_clean, "data/rds/roads_clean.rds")sab_P <- read_rds("data/rds/sab_P.rds")
roads_clean <- read_rds("data/rds/roads_clean.rds")
roads_bmr <- st_intersection(sab_P,roads_clean)Saving the roads network in BMR:
write_rds(roads_bmr, "data/rds/roads_bmr.rds")roads_bmr <- read_rds("data/rds/roads_bmr.rds")
plot(st_geometry(roads_bmr))
plot(st_geometry(sab_P), add = T)
3.2.5 Extracting the accidents records within BMR
acc_sf <- read_rds("data/rds/acc_sf.rds")
acc_bmr <- st_intersection(sab_P,acc_sf)Saving the accidents records within BMR:
write_rds(acc_bmr,"data/rds/acc_bmr.rds")3.3 Visualising the Geospatial Data
Before jumping into the analysis, we will use plot(), ggplot() and tmap() to take a quick look.
sab_C <- read_rds("data/rds/sab_C.rds")
acc_bmr <-read_rds("data/rds/acc_bmr.rds")
plot(st_geometry(roads_bmr),
col = "grey")
plot(acc_bmr, add = T,
col = "red",
pch = 19,
cex = 0.1)
plot(st_geometry(sab_C), add = T)
highway_color = c("motorway" = "#FF00FF",
"trunk" = "#00FF00",
"primary" = "#FFFF00",
"secondary" = "#0000FF",
"tertiary" = "#33AABB",
"unclassified" = "#5500FF")
ggplot() +
geom_sf(data = sab_C,
color = "black",
size = 1.0) +
geom_sf(data = roads_bmr,
aes(color = highway),
size = 1) +
geom_sf(data = acc_bmr,
color = "red",
size = 0.1) +
labs(title = "Road Network and Accident Points",
x = "Longitude",
y = "Latitude") +
scale_color_manual(values = highway_color) 
tmap_mode('view')
tm_shape(acc_bmr) +
tm_dots(col ="red",
alpha = 0.5,
size = 0.1) +
tm_shape(roads_bmr) +
tm_lines(col = "highway",
palette = highway_color,
lwd = 2)